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1. Introduction 


As part of the transformation and modernization process, the U.S. Army has sponsored 
experiments ranging from large-scale experiments and evaluations (Advanced Warfighting 
Experiments [AWEs] and Network Integration Evaluation [NIE]) to smaller Advanced 
Technology Demonstrations (ATDs) that evaluate the maturity of a technology and assess its 
potential application to a military need. This research supports the integration of Soldier 
operators, maintainers, and trainers with weapons, vehicle, and communications systems. 

Field experimentation provides important insights in this regard, for many of the operational and 
human factors issues affected by technology do not appear in isolated tests. Rather, the full 
implications, limitations, and strengths of a system emerge only when that system is employed in 
concert with other systems under demanding field conditions. As a result, field experimentation 
represents an important tool for assessing the interoperability of new systems with existing 
systems and functional areas. 

For field experimentation to be effective it is important to have a measurement framework and 
corresponding instrumentation to assess how well new technology impacts operator, staff, and 
organizational perfonnance. To meet this need, one goal of the U.S. Army Research 
Laboratory’s (ARL’s) Human Research and Engineering Directorate (HRED) is to support the 
Human Dimension (HD) Major Laboratory Program (MLP). HD was formalized by the U.S. 
Army Training and Doctrine Command (TRADOC) and is a major Anny effort designed to 
support evaluations of the integration between Soldiers and communications systems, weapons, 
and vehicles ( 1 ). 

HRED is responsible for evaluating Soldier-system performance to make certain Soldiers are 
equipped with systems they can operate proficiently with minimal risk. A major objective of this 
effort has been to develop and execute a systems engineering approach along with standardized 
field-operational Soldier performance metrics to quantify and validate integrated Soldier- 
information systems perfonnance on the digital battlefield. 

The current HD effort involves expanding previous Human Factors (HF) metrics and analyses to 
develop an understanding of how the commander and the battle staff use all the Army Battle 
Command System (ABCS) and Network Operation subsystems to support effective battle 
command. This effort began with a task analysis, but moved to a more total systems perspective 
considering how effectively new technology supported the commander and staff and, more 
generally, fit the operational needs of the larger organization from a Soldier in a Squad to one in 
the Division. The purpose of this paper is to describe this approach, with an emphasis on the 
measurement framework and psychometric properties of the questionnaire developed thus far 
from this research. The framework and questionnaire have been used successfully to support 
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NIEs, Limited User Tests (LUTs), Initial Operational Test and Evaluation (IOT&E), Advanced 
Warfighting Experiments (Joint Contingency Force, Division Capstone Exercise I and II), and 
related Force XXI and Anny Transfonnation activities. 


2. Approach 


We used a three-step approach to move from a task-based to a more global systems approach for 
examining usability, functionality, and performance from operator, commander and staff, and 
organizational perspectives. The first step was identifying tasks and behavioral characteristics 
that were associated with effective battle command perfonnance at the operator, staff, and 
organizational levels. The second step was developing a framework for measuring how well 
technology provides the usability, functionality, and performance required at each of the three 
levels. And the third step was developing questions using Likert scales, cue cards and tools to 
implement the measurement framework. This section briefly describes this three-step approach. 
The following section presents data demonstrating the questionnaire’s reliability when used to 
evaluate the Advanced Field Artillery Tactical Data System (AFATDS) version 6.5.0. 

2.1 Step 1: Moving From Tasks to Consider the HD of Battle Command 

We began by using the Universal Joint Task List (UJTL) (2) to identify essential Command and 
Control (C2) tasks. The UJTL is organized into four separate parts according to the level of war: 
(1) Strategic level—National military tasks, (2) Theater-level tasks, (3) Operational-level tasks, 
and (4) Tactical-level tasks. Each task in the UJTL is individually indexed to reflect its 
placement in the structure. Thus, the UJTL provides a standard reference system for users to 
address and report requirements, capabilities, or issues and as such formed the command staff 
task baseline around which ARL developed its standardized Soldier performance metrics 
research. 

However, battle command itself is a process conducted by an organization consisting of system 
operators, Soldiers, battle staff, and leaders in a tactical environment working toward a common 
goal. To assess digital battle command and network operations system perfonnance and 
effectiveness, one must understand battle command as a human process that involves the 
complex interaction of cognitive and socio-organizational factors supported by digitization and 
the environment (see figure 1). Such assessments move beyond simple task level descriptions of 
battle command and involve the need to investigate specific battle command organizational 
proficiencies. 
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Figure 1. Total systems engineering approach (variables influencing battle command). 

These proficiencies are reflected in specific patterns of organizational, staff, and operator-level 
behaviors that allow the sequence of decision-making tasks associated with battle command to 
be carried out effectively, efficiently, and with an appropriate level of adaptability to different 
mission sets. To better understand these behaviors, ARL developed a survey based on staff 
proficiencies focusing on the interrelationships between staff functions or processes required for 
effective C2 decision making. In particular, ARL’s survey metrics methodology established a 
cross-linking of FM 101-5 (5) military decision-making processes (MDMP) with the ABCS 
software modules thought to support critical command and staff task execution. FM 101-5 states 
that a staff supports the “science of control” in four primary ways: (1) gathers and provides 
information to the commander, (2) makes estimates of the set of actions required, (3) prepares 
plans and orders, and (4) measures organization behavior. To perform this type of support, the 
staff and commanders use various time-dependent decision making and information management 
processes that require extensive staff coordination between and within echelons. It was assumed 
that the ABCS and network operations subsystem software function capabilities were developed 
to support these human-centered C2 processes and avoid errors in judgment and timing. Based 
on the survey and guidance from TRADOC, 12 HDs of digitized battle command operations 
(figure 2) were selected from which to extend the previously developed HF measurement 
framework so that it also would measure the effects of technology on the HDs of battle 
command. 
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Figure 2. Three human factors areas of digitization system assessment. 

2.2 Step 2: Measurement Framework 

As a lead agency for Human Factors Integration, ARL participated in the performance evaluation 
of new versions of the ABCS during several NIEs and operational tests. The NIEs are 
evaluations of emerging systems in operational scenarios that allow us to demonstrate 
interoperability early in the development life cycle. A set of perfonnance metrics was developed 
to support those evaluations. These metrics were based on the measurement framework shown 
in figure 2, which was built after completing step no. 1 in our previously described systems 
engineering approach. 

The measurement framework (figure 2) is called the System User Performance Evaluation 
Profile, or Profile for short. This framework and corresponding questions used to implement it 
were used to collect data to determine if the ABCS met operator, commander and staff, and 
organizational objectives and needs. These metrics represented the perfonnance measures for 
the prototype operational system and were a key part of the system requirements validation 
process. The Profile has three branches of performance metrics. The first branch, called 
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Hardware and Software, represents conventional hardware and software test and verification 
criteria. Rushby ( 4 ) referred to them as the basic service requirements of a system. The service 
requirements or criteria in the Profile are important for military systems, and build upon an 
earlier set developed by Adehnan and Ulvila (5). These criteria measure, for example, how 
portable the new system is from one location to another, how survivable it is, how good the 
documentation is, how easy the system is to set up and integrate with other systems, how fast and 
reliable it is, and so forth. 

The second branch in the Profile measurement framework is the “usability and training factor.” 
System usability has a direct impact on staff performance because shortcomings in system 
usability lead to underlying error patterns, attention deficits, and excessive workload which can 
be linked to inappropriate decisions and priorities, serious delays in operational tempo, and 
failures in effective staff coordination and communications. The usability of the metrics was 
guided by human-computer system issues that have been described in the research literature 
(e.g., 6, 7) as reflecting hardware and software design with good interface usability. The 
usability characteristics included whether the computer system: (1) contains simple and natural 
dialogue, (2) applications reflect doctrine, (3) “speaks” the user’s language, (4) minimizes user 
memory load, (5) remains consistent between different modules and across applications, (6) 
provides user feedback, (7) provides clearly marked exits from modules, (8) provides process 
shortcuts, and (9) prevents errors. 

Additionally, questions regarding conventional human factors (HF) considerations (e.g., screen 
display contrast, symbol color, and screen layout) were also included in the survey. These 
criteria build upon an earlier set found in Adehnan and Riedel (5). For in-depth specific HF 
design standards one can also refer to MIL Standard 1472-G (9). This standard establishes 
general human engineering criteria for design and development of military systems, equipment, 
and facilities. 

The purpose of this standard is to present human engineering design criteria, principles, and 
practices to be applied in the design of systems, equipment, and facilities so as to: (1) achieve 
required perfonnance by operator, control, and maintenance personnel, (2) achieve required 
manpower readiness for system performance, (3) achieve required reliability of personnel- 
equipment combinations, and (4) foster design standardization within and among systems (9). 

The third branch of the Profile measures how well the system supports the battle command 
functions of the operator, staff and leader, and larger organization. This branch of the 
measurement framework builds directly on the HD research perfonned in step no. 1 of our 
systems engineering approach and utilizes some of the performance criteria found in Adehnan 
and Riedel (5). In particular, the Perfonnance branch measures the system’s effect on the ability 
of the operator, staff and leader, and larger organization to perform their work in a timely way, 
and on the quality of their processes and resulting products. In addition, the Perfonnance branch 
measures the system’s interoperability with other systems, both horizontally and vertically within 
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the organization. Lastly, it obtains overall measures of the system’s functionality and effect on 
staff proficiency. 

2.3 Step 3: Questionnaire for Implementing the Profile Measurement Framework 

Questions were developed to measure the effect of ABCS system components on each of the 
criteria in the Profile measurement framework. The questions were designed so that only the 
name of the ABCS system component needed to be changed, thereby permitting the same 
questions to be used as metrics for evaluating all components. This pennitted a standard frame 
of reference for measuring the components, so that the HF and HD effects of each component 
could be evaluated relative to other ABCS components. This approach has been used previously 
to evaluate the relative strengths and weaknesses decision support and expert system components 
( 10 ). 

Participants answered each question using a 5-point Likert scale. Their responses measured how 
well they thought the system performed on that metric. The questions were always written so 
that higher scores meant that the system was performing better on the metric. This was done so 
that respondents could complete the questionnaire quickly given the limited time available for 
data collection during the AWEs and not be confused by changes in the directionality of the 
wording of the question. 

Two examples are presented in figure 3 for evaluating the Advanced Field Artillery Tactical 
Data System (AFATDS). Both questions measure AFATDS’ effect on the staff and leader’s 
process quality in the third branch of the Profile measurement framework. The first question 
measures AFATDS’ effect on staff planning, and the second question measures AFATDS’ effect 
on staff collaboration. 

How user-friendly is the use of the AFATDS for fire support planning processing? 

(1) Very Poor (2) Poor (3) Adequate (4) Good (5) Very Good 

How user-friendly is it to use the AFATDS (White Board) for conferencing? 

(1) Very Poor (2) Poor (3) Adequate (4) Good (5) Very Good_ 


Figure 3. AFATDS questionnaire example. 
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3. Application 


Questionnaires based on our systems engineering approach were developed to evaluate each 
component of the ABCS during exercises, as previously described. This section presents data 
showing the internal and inter-rater reliability of the questionnaire used to evaluate AFATDS, 
and demonstrates the technical adequacy of the measurement instrument. Similar technical 
analyses of the questionnaires for the other net-centric subsystems are currently underway and 
will be reported in later documents. We begin this section by presenting the requirements 
statement for ABCS and then provide a brief description of the role of AFATDS. After doing so, 
we present the reliability results for the questionnaire measuring AFATDS. 

The U.S. Army Battle Command System Capstone Requirements Document (CRD), revision 3a 

(11) describes the objective system as follows: 

The objective ABCS will provide seamless real or near real time C4I capabilities which 
increase the lethality and information dominance of friendly forces from the strategic echelon 
to the foxhole across all spectrums of conflict. The ABCS will allow commanders to utilize 
dominant firepower systems more effectively to destroy enemy forces in an extended area of 
operations while protecting friendly forces. The firepower will be enhanced by providing the 
commander the ability to make quicker, more accurate decisions, and orchestrate combat 
power at critical times and places faster than an adversary. Additionally, the ABCS will 
enhance SA and enable friendly forces to share a CTP while communicating and targeting in 
real or near-real time. The ABCS will reduce the uncertainty of war situations, decrease 
decision-making time, and contribute to increased lethality, survivability, and operational 
tempo while reducing the potential for fratricide. The objective ABCS will be 
computationally intensive not communications intensive. It will employ a common computer 
architecture and communications hardware, a core set of common support software, and 
software which is functionally unique to each sub-system. 

ABCS is an evolving "system of systems" that needs individual subsystem testing and 
evaluation. The entire family of systems will be assessed individually and collectively to 
ensure that the functional requirements are met as well as the overarching commander 
decision-making and human dimension requirements contained in this document. 

The Advanced Field Artillery Tactical Data System is the fire support component of ABCS and 
provides automated decision support for fire support (FS), to include joint and combined fires 

(12) . AFATDS supports the planning, coordination, control, and execution of close support, 
counter fire, interdiction, deep operations, and suppression of enemy air defense. It is a single 
integrated fire support asset manager. It provides decision aids and an infonnation system for 
the synchronization of all types of fire support means. 
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4. AFATDS Questionnaire Validation 


This section presents data regarding the internal and inter-rater reliability of the questionnaire 
used to evaluate AFATDS. The internal reliability (or consistency) of the questionnaire was 
measured by the Cronbach Alpha statistic. This statistic is based on the premise that if one has 
successfully sampled items from the hypothetical population of items measuring the same 
construct, then the responses to these items should be correlated highly. The inter-rater 
reliability of the questionnaire was measured by correlating the respondents’ answers to the 
questions. The correlations were calculated to determine the extent of agreement among the 
respondents. Consistent with Gable and Wolf (13), the goal was to ensure that the questionnaire 
had high internal consistency and then detennine if there was high agreement (or disagreement) 
among the respondents about the value of AFATDS. 

Fifteen AFATDS operators completed the questionnaire, providing an adequate number of users. 
The calculated Cronbach Alpha value was 0.81. Since this value was higher than 0.70, which is 
routinely used as the necessary Chronbach Alpha value (13), we concluded that the AFATDS 
questionnaire had an acceptable level of internal reliability. 

Table 1 presents the inter-rater reliability correlations among the 15 respondents. (Respondents 
R04 and RIO did not complete the questionnaire.) The correlations are Pearson product-moment 
correlations, which can range from +1.0 indicating perfect agreement to -1.0 indicating perfect 
disagreement. A Pearson product-moment correlation (r) of 0.39 is significantly different than a 
correlation of 0.0, which means no agreement, at the p = 0.05 (alpha) level for a t-test with 17 
degrees of freedom, two less than the number of questions. Eight of the 15 respondents (R01, 
R02, R03, R05, R06, R08, R12, and R15) had significant inter-rater correlations (r > 0.39) with 
every other respondent; that is, for all 14 comparisons. And three other respondents (R07, R11, 
and R14) had significant comparisons for 13 of the 14 possible comparisons with all other 
respondents. So, in total, 73% (11 of 15) of the AFATDS respondents had at a minimum of 13 
out of a possible 14 (93%) significant comparisons (inter-rater correlations). This means that the 
large majority of the responding AFATDS operators significantly agreed with each other when 
using the questionnaire to evaluate AFATDS, and that the questionnaire had an adequate level of 
inter-rater reliability. 

In general, we thought that the level of agreement was moderate, not high. To reach this 
conclusion, we assumed that a “high” correlation had to be at least 0.71 because that correlation 
would account for 50% of the variation between two respondents’ scores. Only three (20%) of 
the respondents (R01, R02, and R12) had 7 or more inter-rater correlations > 0.71 for the 14 
possible comparisons. Since 73% of the respondents had significant correlations (r > 0.39) for at 
least 13 of their 14 comparisons, but only 20% had high correlations (r > 0.71) for 50% of their 
comparisons, we concluded that there was, in general, a significant but only moderate level of 


8 




agreement among the responding AFATDS operators. Nevertheless, the questionnaire met the 
requirements for inter-rater reliability. 


Table 1. AFATDS correlation matrix for inter-rater reliability. 


Resp 

R01 

R02 

R03 

R05 

R06 

R07 

R08 

R09 

Rll 

R12 

R13 

R14 

R15 

R16 

R17 

R01 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R02 

0.84 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R03 

0.70 

0.66 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R05 

0.79 

0.71 

0.52 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R06 

0.80 

0.81 

0.59 

0.59 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R07 

0.64 

0.60 

0.52 

0.45 

0.60 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

- 

R08 

0.67 

0.65 

0.82 

0.42 

0.68 

0.59 

1.00 

- 

- 

- 

- 

- 

- 

- 

- 

R09 

0.65 

0.66 

0.44 

0.46 

0.77 

0.48 

0.54 

1.00 

- 

- 

- 

- 

- 

- 

- 

Rll 

0.75 

0.71 

0.48 

0.58 

0.60 

0.79 

0.55 

0.58 

1.00 

- 

- 

- 

- 

- 

- 

R12 

0.80 

0.84 

0.67 

0.60 

0.82 

0.77 

0.74 

0.61 

0.79 

1.00 

- 

- 

- 

- 

- 

R13 

0.69 

0.62 

0.42 

0.44 

0.66 

0.57 

0.41 

0.41 

0.64 

0.69 

1.00 

- 

- 

- 

- 

R14 

0.78 

0.91 

0.69 

0.55 

0.76 

0.62 

0.72 

0.50 

0.67 

0.79 

0.72 

1.00 

- 

- 

- 

R15 

0.77 

0.71 

0.64 

0.49 

0.68 

0.74 

0.52 

0.52 

0.68 

067 

0.75 

0.69 

1.00 

- 

- 

R16 

0.54 

0.40 

0.54 

0.39 

0.48 

0.14 

0.46 

0.30 

0.16 

0.47 

0.13 

0.23 

0.33 

1.00 

- 

R17 

0.64 

0.75 

0.57 

0.48 

0.60 

0.54 

0.63 

0.39 

0.45 

0.67 

0.60 

0.80 

0.59 

0.23 

1.00 


It is also interesting to note that sometimes there was a very high level of agreement between two 
AFATDS operators. In particular, R02 and R14 had an inter-rater correlation of 0.91. This 
means that one essentially could use either of these two respondents to predict the ratings of the 
other. At the other extreme, R13 and R16 had an inter-rater correlation of only 0.13, indicating 
that they hardly agreed with each other at all. Since the results reported above indicate that the 
questionnaire has acceptable levels of internal and inter-rater reliability, future research will try 
to more fully understand why operators disagree with each other when evaluating the same 
ABCS components. 


5. Conclusion 


This report describes the systems engineering approach that ARL has used to develop 
perfonnance metrics for evaluating digitization for the U.S. Anny Battle Command. The 
approach emphasized measurement of system effects on the human dimensions of battle 
command at the operator, commander and staff, and organizational levels. The measurement 
framework assessed digitization usability and perfonnance as well as traditional hardware and 
software verification criteria. The questionnaire developed to implement the framework has 
passed the requirements of internal and inter-rater reliability. Future research will make use of 
the questionnaire to evaluate other ABCS components and identify ways for improving not only 
those components, but also our overall measurement approach. 
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